{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "目录\n",
    "\n",
    "* [训练XGBoost简单demo](#训练XGBoost简单demo)\n",
    "* [交叉验证](#交叉验证)\n",
    "* [进阶操作-调整样本权重](#进阶操作-调整样本权重)\n",
    "* [进阶操作-自定义目标函数（损失函数）](#进阶操作-自定义目标函数（损失函数）)\n",
    "* [用前n棵树做预测](#用前n棵树做预测)\n",
    "* [绘制特征重要度](#绘制特征重要度)\n",
    "* [与sklearn组合使用](#与sklearn组合使用)\n",
    "* [优化超参数-网格搜索](#优化超参数-网格搜索)\n",
    "\n",
    "【注意】：\n",
    "xgboost有2套python API：1个是原生api，1个是sklearn 风格的api。为了确定估计器的数目，也就是boosting迭代的次数，也可以说是残差树的数目，\n",
    "参数名为n_estimators、num_iterations、num_round、num_boost_round都是等价的，都是num_boost_round的别名\n",
    "\n",
    "**api官方文档：**\n",
    "- [原生Python api文档](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.training)\n",
    "- [sklearn风格api文档](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn)\n",
    "\n",
    "下面的示例中先介绍原生api的使用，后面再介绍sklearn 风格的api的使用\n",
    "\n",
    "## 训练XGBoost简单demo\n",
    "\n",
    "先指定一组参数param，如何去训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [],
   "source": [
    "# 数据是libsvm格式，很适合用于**稀疏存储**的数据：\n",
    "# 例如 0 1:1 9:1 19:1 21:1 24:1 ...\n",
    "# 第1列是target 后面是index:数值\n",
    "\n",
    "import numpy as np\n",
    "import scipy.sparse\n",
    "import pickle # python2中叫cPickle python3中改为pickle\n",
    "import xgboost as xgb\n",
    "\n",
    "cwd = '/home/xijian/pycharm_projects/Magic-NLPer/MachineLearning/'\n",
    "data_dir = cwd+'XGBoostUsage/data/'"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "outputs": [],
   "source": [
    "# 读取数据集\n",
    "# xgb.DMatrix()可以直接读取libsvm格式的数据\n",
    "dtrain = xgb.DMatrix(data_dir + 'agaricus.txt.train')\n",
    "dtest = xgb.DMatrix(data_dir + 'agaricus.txt.test')"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0]\teval-error:0.04283\ttrain-error:0.04652\n",
      "[1]\teval-error:0.02173\ttrain-error:0.02226\n",
      "[2]\teval-error:0.00621\ttrain-error:0.00706\n",
      "[3]\teval-error:0.01800\ttrain-error:0.01520\n",
      "[4]\teval-error:0.00621\ttrain-error:0.00706\n",
      "[5]\teval-error:0.00000\ttrain-error:0.00123\n",
      "[6]\teval-error:0.00000\ttrain-error:0.00123\n",
      "[7]\teval-error:0.00000\ttrain-error:0.00123\n",
      "[8]\teval-error:0.00000\ttrain-error:0.00123\n",
      "[9]\teval-error:0.00000\ttrain-error:0.00000\n"
     ]
    }
   ],
   "source": [
    "# 设定模型参数\n",
    "param = {'max_depth':2,  # 树深\n",
    "         'eta': 1,\n",
    "         'verbosity': 0,\n",
    "         'objective': 'binary:logistic'}\n",
    "watch_list = [(dtest, 'eval'), (dtrain, 'train')]\n",
    "number_round = 10 # 跑10轮（10棵子树） # 通用参数\n",
    "# 训练\n",
    "model = xgb.train(param, dtrain, num_boost_round=number_round, evals=watch_list)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'numpy.ndarray'> float32 (1611,)\n"
     ]
    }
   ],
   "source": [
    "# 预测\n",
    "pred = model.predict(dtest) # 返回的是numpy.array\n",
    "print(type(pred), pred.dtype, pred.shape) # (1611,)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'numpy.ndarray'> float32 (1611,)\n",
      "[0. 1. 0. 0. 0. 0. 1. 0. 1. 0.]\n"
     ]
    }
   ],
   "source": [
    "# groundtruth\n",
    "labels = dtest.get_label()\n",
    "print(type(labels), labels.dtype, labels.shape)\n",
    "print(labels[:10])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n"
     ]
    }
   ],
   "source": [
    "error_num = sum([i for i in range(len(pred)) if int(pred[i]>0.5)!=labels[i]])\n",
    "print(error_num)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 交叉验证"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "outputs": [
    {
     "data": {
      "text/plain": "   train-error-mean  train-error-std  test-error-mean  test-error-std\n0          0.046522         0.001200         0.046523        0.004801\n1          0.022263         0.000867         0.022263        0.003470\n2          0.007063         0.000506         0.007062        0.002023\n3          0.015200         0.001074         0.015199        0.004295\n4          0.007063         0.000506         0.007062        0.002023",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>train-error-mean</th>\n      <th>train-error-std</th>\n      <th>test-error-mean</th>\n      <th>test-error-std</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0.046522</td>\n      <td>0.001200</td>\n      <td>0.046523</td>\n      <td>0.004801</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0.022263</td>\n      <td>0.000867</td>\n      <td>0.022263</td>\n      <td>0.003470</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.007063</td>\n      <td>0.000506</td>\n      <td>0.007062</td>\n      <td>0.002023</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0.015200</td>\n      <td>0.001074</td>\n      <td>0.015199</td>\n      <td>0.004295</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0.007063</td>\n      <td>0.000506</td>\n      <td>0.007062</td>\n      <td>0.002023</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "number_round = 5 # 通用参数\n",
    "# nfold=5折\n",
    "xgb.cv(param, dtrain, number_round, nfold=5, metrics={'error'}, seed=3)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 调整样本权重"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "outputs": [
    {
     "data": {
      "text/plain": "   train-auc-mean  train-auc-std  test-auc-mean  test-auc-std\n0        0.958229       0.000999       0.958240      0.004015\n1        0.981414       0.000701       0.981429      0.002824\n2        0.997069       0.000423       0.997046      0.001723\n3        0.998758       0.000123       0.998769      0.000500\n4        0.999332       0.000097       0.999201      0.000382",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>train-auc-mean</th>\n      <th>train-auc-std</th>\n      <th>test-auc-mean</th>\n      <th>test-auc-std</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0.958229</td>\n      <td>0.000999</td>\n      <td>0.958240</td>\n      <td>0.004015</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0.981414</td>\n      <td>0.000701</td>\n      <td>0.981429</td>\n      <td>0.002824</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.997069</td>\n      <td>0.000423</td>\n      <td>0.997046</td>\n      <td>0.001723</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0.998758</td>\n      <td>0.000123</td>\n      <td>0.998769</td>\n      <td>0.000500</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0.999332</td>\n      <td>0.000097</td>\n      <td>0.999201</td>\n      <td>0.000382</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 看一下正负例的比例，然后调整一下权重\n",
    "def preproc(dtrain, dtest, param):\n",
    "    labels = dtrain.get_label()\n",
    "    ratio = float(np.sum(labels==0))/np.sum(labels==1)\n",
    "    param['scale_pos_ratio'] = ratio\n",
    "    return (dtrain, dtest, param)\n",
    "\n",
    "xgb.cv(param, dtrain, number_round, nfold=5, metrics={'auc'}, seed=3, fpreproc=preproc)\n",
    "# auc值越接近1效果越好"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 进阶操作-自定义目标函数（损失函数）\n",
    "\n",
    "目标函数：预测与label的接近程度，值越大越好，梯度上升\n",
    "\n",
    "损失函数：预测与label的差距程度，值越小越好，梯度下降\n",
    "\n",
    "xgboost中如果要使用自定义目标函数，就要自己提供一阶导数和二阶导数的实现\n",
    "\n",
    "【注意】：\n",
    "在逻辑回归章节，求导时，梯度=X.T.dot(h_x-y)，为什么上面只写了(h_x-y)？\n",
    "\n",
    "因为：这里的p-y.label，其实只是梯度的“公共部分”，因为不同的样本要乘以各自的样本值，所以是无法提前算出来的，\n",
    "因此往往用p-y.label计算梯度的系数部分，实际拿到样本，再乘以各自的数据。\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0]\teval-rmse:1.59229\ttrain-rmse:1.59597\teval-error:0.04283\ttrain-error:0.04652\n",
      "[1]\teval-rmse:2.40519\ttrain-rmse:2.40977\teval-error:0.02173\ttrain-error:0.02226\n",
      "[2]\teval-rmse:2.88253\ttrain-rmse:2.87459\teval-error:0.00621\ttrain-error:0.00706\n",
      "[3]\teval-rmse:3.62808\ttrain-rmse:3.63621\teval-error:0.01800\ttrain-error:0.01520\n",
      "[4]\teval-rmse:3.80794\ttrain-rmse:3.83893\teval-error:0.00621\ttrain-error:0.00706\n"
     ]
    }
   ],
   "source": [
    "# 自定义目标函数：log似然，交叉验证\n",
    "# 需要提供一阶导数和二阶导数\n",
    "def logregobj(pred, dtrain):\n",
    "    labels = dtrain.get_label()\n",
    "    pred = 1. / (1+np.exp(-pred)) # sigmoid\n",
    "    grad = pred - labels # 1阶导数\n",
    "    hess = pred*(1-pred) # 2阶导数（海森矩阵），sigmoid的导数：g'(x)=g(x)(1-g(x))\n",
    "    return grad, hess\n",
    "\"\"\"\n",
    "在逻辑回归章节，求导时，梯度=X.T.dot(h_x-y)，为什么上面只写了(h_x-y)\n",
    "这里的p-y.label，其实只是梯度的“公共部分”，因为不同的样本要乘以各自的样本值，\n",
    "所以是无法提前算出来的，因此往往用p-y.label计算梯度的系数部分，实际拿到样本，再乘以各自的数据。\n",
    "\"\"\"\n",
    "\n",
    "def evalerror(pred, dtrain):\n",
    "    labels = dtrain.get_label()\n",
    "    error_num = float(sum(labels!=(pred>0.))) # sigmoid函数g(z)>0.5的话，要z>0.\n",
    "    return 'error', error_num/len(labels)\n",
    "\n",
    "# 模型参数\n",
    "param = {'max_depth':2,  # 树深\n",
    "         'eta': 1,\n",
    "         'verbosity': 0,}\n",
    "watch_list = [(dtest, 'eval'), (dtrain, 'train')]\n",
    "number_round = 5 # 通用参数\n",
    "\n",
    "# 自定义目标函数训练\n",
    "model = xgb.train(param, dtrain,\n",
    "                  num_boost_round=number_round,\n",
    "                  evals=watch_list,\n",
    "                  obj=logregobj, # 目标函数\n",
    "                  feval=evalerror) # 评价函数"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "outputs": [
    {
     "data": {
      "text/plain": "   train-rmse-mean  train-rmse-std  test-rmse-mean  test-rmse-std  \\\n0         1.594727        0.001794        1.594569       0.017434   \n1         2.403623        0.005003        2.403480       0.030395   \n2         2.856625        0.011625        2.855065       0.033168   \n3         3.619403        0.016150        3.617983       0.029252   \n4         3.804502        0.030278        3.810880       0.048983   \n\n   train-error-mean  train-error-std  test-error-mean  test-error-std  \n0          0.046522         0.001200         0.046523        0.004801  \n1          0.022263         0.000867         0.022263        0.003470  \n2          0.007063         0.000506         0.007062        0.002023  \n3          0.015200         0.001074         0.015199        0.004295  \n4          0.005988         0.002618         0.005527        0.001488  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>train-rmse-mean</th>\n      <th>train-rmse-std</th>\n      <th>test-rmse-mean</th>\n      <th>test-rmse-std</th>\n      <th>train-error-mean</th>\n      <th>train-error-std</th>\n      <th>test-error-mean</th>\n      <th>test-error-std</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1.594727</td>\n      <td>0.001794</td>\n      <td>1.594569</td>\n      <td>0.017434</td>\n      <td>0.046522</td>\n      <td>0.001200</td>\n      <td>0.046523</td>\n      <td>0.004801</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2.403623</td>\n      <td>0.005003</td>\n      <td>2.403480</td>\n      <td>0.030395</td>\n      <td>0.022263</td>\n      <td>0.000867</td>\n      <td>0.022263</td>\n      <td>0.003470</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2.856625</td>\n      <td>0.011625</td>\n      <td>2.855065</td>\n      <td>0.033168</td>\n      <td>0.007063</td>\n      <td>0.000506</td>\n      <td>0.007062</td>\n      <td>0.002023</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>3.619403</td>\n      <td>0.016150</td>\n      <td>3.617983</td>\n      <td>0.029252</td>\n      <td>0.015200</td>\n      <td>0.001074</td>\n      <td>0.015199</td>\n      <td>0.004295</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>3.804502</td>\n      <td>0.030278</td>\n      <td>3.810880</td>\n      <td>0.048983</td>\n      <td>0.005988</td>\n      <td>0.002618</td>\n      <td>0.005527</td>\n      <td>0.001488</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 5折\n",
    "xgb.cv(param, dtrain, number_round, nfold=5, seed=3, obj=logregobj, feval=evalerror)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 用前n棵树做预测\n",
    "\n",
    "number_round = 5 做了5轮，就会有5棵树产生\n",
    "\n",
    "但是可以仅使用前几棵树做预测"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('error', 0.04283054003724395)\n"
     ]
    }
   ],
   "source": [
    "pred1 = model.predict(dtest, ntree_limit=1)\n",
    "print(evalerror(pred1, dtest))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('error', 0.021725636250775917)\n"
     ]
    }
   ],
   "source": [
    "pred2 = model.predict(dtest, ntree_limit=2)\n",
    "print(evalerror(pred2, dtest))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('error', 0.006207324643078833)\n"
     ]
    }
   ],
   "source": [
    "pred3 = model.predict(dtest, ntree_limit=3)\n",
    "print(evalerror(pred3, dtest))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 绘制特征重要度"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "outputs": [
    {
     "data": {
      "text/plain": "<AxesSubplot:title={'center':'Feature importance'}, xlabel='F score', ylabel='Features'>"
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": "<Figure size 432x288 with 1 Axes>",
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgkAAAEWCAYAAADsCgQrAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAnoUlEQVR4nO3df3hcdZ328fdNqFhSASFNdQ2lrWXV0gJCF8yKMQXUYFVQUclWNFasCFq4trBLV7eyuig+LgrPog9bCe2qWHT5vQIRVh1R+bEUttJCDYswQKsFW36ZWKEpn+ePOa3T5CSZaTNzpsn9uq65euac7znfe06myWe+58w5igjMzMzM+tsj6wBmZmZWm1wkmJmZWSoXCWZmZpbKRYKZmZmlcpFgZmZmqVwkmJmZWSoXCWa7SNI/SLo86xxmZiNNvk6CZUlSHpgEbC2a/ZcR8dtd3OZpEfFfu5Zu9yPpfGB6RHw46yxmtvvzSILVgndHxISix04XCCNB0p5Z9r+zdtfcZla7XCRYTZK0r6ROSb+TtF7SP0uqS5a9VtJPJG2StFHSlZL2S5Z9B5gM/KekHkl/J6lV0rp+289LOj6ZPl/S1ZK+K+l5oGOo/lOyni/pu8n0FEkh6WOSnpD0jKTTJf2VpPslPSvp0qJ1OyT9UtKlkp6T9GtJxxUt/wtJN0p6WtLDkj7Rr9/i3KcD/wB8KHntv0rafUzSWkl/kPSIpE8WbaNV0jpJiyQ9lbzejxUtHy/pIkmPJfl+IWl8suxNku5IXtOvJLXuxI/azGqYiwSrVcuBPmA68Ebg7cBpyTIBXwb+AngDcCBwPkBEnAo8zp9HJ/5Pif2dCFwN7AdcOUz/pTgaOBj4EHAx8FngeOAQ4IOS3tqv7W+ABuDzwLWS9k+WXQWsS17rycCXJB07SO5O4EvA95PXfljS5ingXcA+wMeAr0s6omgbrwL2BV4DfBz4hqRXJsv+BTgS+Gtgf+DvgJckvQa4CfjnZP45wDWSJpaxj8ysxrlIsFpwffJp9FlJ10uaBLwTODsieiPiKeDrwCkAEfFwRNwWES9ExO+BrwFvHXzzJbkzIq6PiJco/DEdtP8SfTEi/hQRtwK9wIqIeCoi1gM/p1B4bPMUcHFEbImI7wPdwFxJBwJvBv4+2dYq4HLgI2m5I2JzWpCIuCkifhMFPwNuBd5S1GQL8IWk/5uBHuB1kvYA5gNnRcT6iNgaEXdExAvAh4GbI+LmpO/bgJXJfjOzUcLHMK0WnFR8kqGko4BxwO8kbZu9B/BEsnwScAmFP3SvSJY9s4sZniiaPmio/kv0ZNH05pTnE4qer48dzyB+jMLIwV8AT0fEH/otmz1I7lSSTqAwQvGXFF7H3sDqoiabIqKv6Pkfk3wNwMspjHL0dxDwAUnvLpo3DvjpcHnMbPfhIsFq0RPAC0BDvz9e23wJCGBWRDwt6STg0qLl/b+y00vhDyMAybkF/YfFi9cZrv+R9hpJKioUJgM3Ar8F9pf0iqJCYTKwvmjd/q91h+eS9gKuoTD6cENEbJF0PYVDNsPZCPwJeC3wq37LngC+ExGfGLCWmY0aPtxgNScifkdhSPwiSftI2iM5WXHbIYVXUBgSfy45Nn5uv008CUwrev4Q8HJJcyWNAz4H7LUL/Y+0RmChpHGSPkDhPIubI+IJ4A7gy5JeLulQCucMfHeIbT0JTEkOFQC8jMJr/T3Ql4wqvL2UUMmhlyuAryUnUNZJak4Kj+8C75b0jmT+y5OTIJvKf/lmVqtcJFit+giFP3APUjiUcDXw6mTZPwFHAM9ROHnu2n7rfhn4XHKOwzkR8RxwBoXj+espjCysY2hD9T/S7qZwkuNG4ALg5IjYlCxrB6ZQGFW4Dvj8MNd/+I/k302S7ktGIBYCP6DwOv6GwihFqc6hcGjiHuBp4CvAHkkBcyKFb1P8nsLIwrn4d4rZqOKLKZllSFIHhQs/HZN1FjOz/lz1m5mZWSoXCWZmZpbKhxvMzMwslUcSzMzMLNWouk7CfvvtF9OnT886xgC9vb3U19dnHWMA5yqPc5XHucqTZa577713Y0T4kto2wKgqEiZNmsTKlSuzjjFALpejtbU16xgDOFd5nKs8zlWeLHNJeiyTjq3m+XCDmZmZpXKRYGZmZqlcJJiZmVkqFwlmZmaWykWCmZmZpXKRYGZmZqlcJJiZmVkqFwlmZmaWykWCmZmZpXKRYGZmZqlcJJiZmVkqFwlmZmaWykWCmZmZpXKRYGZmZqlcJJiZmY0Bkg6U9FNJD0p6QNJZw61T0SJB0kJJayVdI+lOSS9IOqdfmzZJ3ZIelnRe0fxjJd0naY2kf5e0ZyWzmpmZjXJ9wKKImAG8CThT0oyhVlBEVCyNpF8DxwMvAgcBJwHPRMS/JMvrgIeAtwHrgHuAduDXwGPAcRHxkKQvAI9FROdQ/U2eNj32+OAlFXo1O2/RrD4uWl17NY5zlce5yuNc5VneVk9ra2smfUu6NyJmZ9K5ZUbSDcClEXHbYG0qNpIg6TJgGnALMC8i7gG29Gt2FPBwRDwSES8CVwEnAgcAL0bEQ0m724D3VyqrmZnZWCJpCvBG4O6h2lWsnI6I0yW1AXMiYuMgzV4DPFH0fB1wNLAR2FPS7IhYCZwMHJi2AUkLgAUADQ0TWTKrb6RewoiZNL7w6aXWOFd5nKs8zlWenp4ecrlc1jFsDJA0AbgGODsinh+qbe2NuQEREZJOAb4uaS/gVmDrIG2XAkuhcLihFocRa3V407nK41zlca7yZHm4wcYOSeMoFAhXRsS1w7XP+n/KenYcIWhK5hERdwJvAZD0duAvh9vY+HF1dF84twIxd00ulyM/rzXrGAM4V3mcqzzOVR6PIlilSRLQCayNiK+Vsk7WX4G8BzhY0lRJLwNOAW4EkNSY/LsX8PfAZZmlNDMz2/29GTgVOFbSquTxzqFWqMpIgqRXASuBfYCXJJ0NzIiI5yV9GvgRUAdcEREPJKudK+ldFAqZ/xcRP6lGVjMzs9EoIn4BqJx1KlokRMSUoqdNg7S5Gbg5Zf65wLmVSWZmZmbDyfpwg5mZmdUoFwlmZmaWykWCmZmZpXKRYGZmZqlcJJiZmVkqFwlmZmaWykWCmZmZpXKRYGZmZqlcJJiZmVkqFwlmZmaWykWCmZnZGCDpQEk/lfSgpAcknTXcOpkUCZIWSlorKSTdL2m1pDskHVbU5ixJa5IXcnYWOc3MzEaRPmBRRMwA3gScKWnGUCsoIqqSbIdOpV8DxwOTKdzX+hlJJwDnR8TRkmYCVwFHAS8CXcDpEfHwUNudPG167PHBSyqcvnyLZvVx0eqq3HCzLM5VHucqj3OVZ3lbPa2trZn0LeneiJidSeeWGUk3AJdGxG2Dtan6SIKky4BpwC3A0RHxTLLoLv58p8g3AHdHxB8jog/4GfC+amc1MzMbjSRNAd4I3D1ku4xGEvLA7IjYWDTvHOD1EXGapDcANwDNwGbgx8DKiPhMyrYWAAsAGhomHrnk4m9V4RWUZ9J4eHJz1ikGcq7yOFd5nKs8U/etY8KECZn0PWfOHI8kjCGSJlD48H1BRFw7VNuaGHOTNAf4OHAMQESslfQV4FagF1gFbE1bNyKWAkuhcLihFocRa3V407nK41zlca7yZHm4wcYOSeOAa4ArhysQoAaKBEmHApcDJ0TEpm3zI6IT6EzafAlYN9y2xo+ro/vCuZWKutNyuRz5ea1ZxxjAucrjXOVxrvLkcrmsI9goJ0kU/q6ujYivlbJOpl+BlDQZuBY4NSIe6ressajN+4DvVT+hmZnZqPFm4FTgWEmrksc7h1oh65GEJcABwDcLBQ59RcfFrpF0ALAFODMins0mopmZ2e4vIn4BqJx1MikSImJKMnla8khr85aqBTIzM7MBfMVFMzMzS+UiwczMzFK5SDAzM7NULhLMzMwslYsEMzMzS+UiwczMzFK5SDAzM7NULhLMzMwslYsEMzMzS+UiwczMzFK5SDAzqxHz58+nsbGRmTNnZh3FDMioSJC0UNJaSc9Iuj+5E9VKSccUtZks6dak3YOSpmSR1cysWjo6Oujq6so6htl2Wd0F8gzgeOBZoDciQtKhwA+A1ydtvg1cEBG3SZoAvDTcRjdv2cqU826qUOSdt2hWHx3OVbJazbW8rT7rCDbKtbS0kM/ns45htl3VRxIkXQZMA24BPhERkSyqByJpMwPYMyJuA4iInoj4Y7WzmpmZjWX689/oKnYq5YHZEbFR0nuBLwONwNyIuFPSSRRuIf0iMBX4L+C8iNiasq0FwAKAhoaJRy65+FvVeRFlmDQentycdYqBnKs8U/etY8KECVnHGKCnp8e5ylDruTZs2MDixYtZtmxZ1fqeM2fOvRExu2od2m4jq8MN20XEdcB1klqAL1I4DLEn8BbgjcDjwPeBDqAzZf2lwFKAydOmx0WrM39JAyya1Ydzla5Wcy1vq6e1tTXrGAPkcjnnKkOt58rn89TX1+Z7zcaemvlNHBG3S5omqQFYB6yKiEcAJF0PvImUIqHY+HF1dF84t+JZy5XL5cjPa806xgDOVZ5cLpd1BDOzqsr0K5CSpktSMn0EsBewCbgH2E/SxKTpscCD2aQ0M6uO9vZ2mpub6e7upqmpic7OIT8XmVVc1iMJ7wc+ImkLsBn4UHIi41ZJ5wA/ToqIe4HaO9nAzGwErVixIusIZjvIpEiIiCnJ5FeSR1qb24BDq5XJzMzMduQrLpqZmVkqFwlmZmaWykWCmZmZpXKRYGZmZqlcJJiZmVkqFwlmZmaWykWCmZmZpXKRYGZmZqlcJJiZmVkqFwlmZjVi/vz5NDY2MnPmzKyjmAEZFQmSFkpaK+kZSfdLWiVppaRjitp8RdKa5PGhLHKamVVTR0cHXV1dWccw2y6rGzydARwPPAv0RkRIOhT4AfB6SXOBI4DDKdwZMifploh4fqiNbt6ylSnn3VTR4Dtj0aw+OpyrZLWaa3lbfdYRbJRraWkhn89nHcNsu6qPJEi6DJgG3AJ8IrnrI0A9sG16BnB7RPRFRC9wP9BW7axmZmZjmf78N7qKnUp5YHZEbJT0XuDLQCMwNyLulPR24PPA24C9gf8GvhERF6VsawGwAKChYeKRSy6uvTtKTxoPT27OOsVAzlWeqfvWMWHChKxjDNDT0+NcZaj1XBs2bGDx4sUsW7asan3PmTPn3oiYXbUObbeR1eGG7SLiOuA6SS3AF4HjI+JWSX8F3AH8HrgT2DrI+kuBpQCTp02Pi1Zn/pIGWDSrD+cqXa3mWt5WT2tra9YxBsjlcs5VhlrPlc/nqa+vzfeajT0185s4Im6XNE1SQ0RsjIgLgAsAJH0PeGi4bYwfV0f3hXMrHbVsuVyO/LzWrGMM4FzlyeVyWUcwM6uqTL8CKWm6JCXTR1A4SXGTpDpJByTzDwUOBW7NLqmZWeW1t7fT3NxMd3c3TU1NdHZ2Zh3JxrisRxLeD3xE0hZgM/Ch5JsO44CfJ/XD88CHI6Ivw5xmZhW3YsWKrCOY7SCTIiEipiSTX0ke/Zf/icI3HMzMzCwjvuKimZmZpXKRYGZmZqlcJJiZmVkqFwlmZmaWykWCmZmZpSqpSJD0Wkl7JdOtyV0c96toMjMzM8tUqSMJ1wBbJU2ncAnkA4HvVSyVmZmZZa7UIuGl5GJG7wX+NSLOBV5duVhmZmaWtVKLhC2S2oGPAj9M5o2rTCQzMzOrBaUWCR8DmoELIuJRSVOB71QulpmZmWWtpCIhIh4E/h64L3n+aEQMuJyymZntvPnz59PY2MjMmTOzjmIGlP7thncDq4Cu5Pnhkm7c2U6Tb0eslRSS7pe0WtIdkg5Llr9O0qqix/OSzt7Z/szMdgcdHR10dXVlHcNsu1Jv8HQ+cBSQA4iIVZKm7UK/ZwDHA5OBtRHxjKQTKHxz4uiI6AYOB5BUB6wHrhtuo5u3bGXKeTftQqzKWDSrjw7nKlmt5lreVp91BBvlWlpayOfzWccw267kExcj4rl+817amQ4lXQZMA26hUBA8kyy6C2hKWeU44DcR8djO9GdmZmY7p9SRhAck/Q1QJ+lgYCFwx850GBGnS2oD5kTExqJFH6dQOPR3CjDoTdYlLQAWADQ0TGTJrL6diVVRk8YXPh3XGucqT09PD7lcLusYAzhXeWo914YNG+jt7a3JjDb2KCKGbyTtDXwWeHsy60fAP0fEn3aqUykPzN5WJEiaA3wTOCYiNhW1exnwW+CQiHhyuO1OnjY99vjgJTsTqaIWzerjotWl1mPV41zlWd5WT2tra9YxBsjlcs5VhlrPlc/nede73sWaNWuq1rekeyNidtU6tN3GsL+Jk3MCboqIORQKhREl6VDgcuCE4gIhcQJwXykFAsD4cXV0Xzh3pCPuslwuR35ea9YxBnCu8viTnZmNNcOekxARW4GXJO070p1LmgxcC5waEQ+lNGlniEMNZmajSXt7O83NzXR3d9PU1ERnZ2fWkWyMK3VMtwdYLek2oHfbzIhYuIv9LwEOAL4pCaBv25CXpHrgbcAnd7EPM7PdwooV/kxktaXUIuHa5DEiImJKMnla8khr00uhgDAzM7MMlFQkRMS/VzqImZmZ1ZaSigRJjwIDvgYREbtyQSUzMzOrYaUebij+aszLgQ8A+498HDMzM6sVpd7gaVPRY31EXAzU3ncNzczMbMSUerjhiKKne1AYWai9q92YmZnZiCn1D/1FRdN9wKPAB0c+jpmZmdWKUouEj0fEI8UzJE2tQB4zMzOrEaXeBfLqEueZmZnZKDHkSIKk1wOHAPtKel/Ron0ofMvBzMzMRqnhDje8DngXsB/w7qL5fwA+UaFMZmZmVgOGLBIi4gbgBknNEXFnlTKZmY1J8+fP54c//CGNjY1VvVW02WBKPSfhfySdKembkq7Y9hhqBUkLJa2VdI2kOyW9IOmcfm2ukPSUpDX95n9A0gOSXpLke5yb2ZjQ0dFBV1dX1jHMtiv12w3fAX4NvAP4AjAPWDvMOmcAxwMvAgcBJ6W0WQ5cCny73/w1wPuAfysxHwCbt2xlynk3lbNKVSya1UeHc5WsVnMtb6vPOoKNci0tLeTz+axjmG1X6kjC9Ij4R6A3udnTXODowRpLugyYBtwCzIuIe4At/dtFxO3A0ynz10ZEd4nZzMzMrAJKHUnY9gf+WUkzgQ1A42CNI+J0SW3AnIjYuIsZhyRpAbAAoKFhIktm9VWyu50yaXzh03Gtca7y9PT0kMvlso4xgHOVp9Zzbdiwgd7e3prMaGNPqUXCUkmvBP4RuBGYACypWKoyRMRSYCnA5GnT46LVtXe16EWz+nCu0tVqruVt9bS2tmYdY4BcLudcZaj1XPl8nvr62nyv2dhT0m/iiLg8mfwZhcMINWn8uDq6L6y9+07lcjny81qzjjGAc5XHn+zMbKwp6ZwESZMkdUq6JXk+Q9LHKxvNzGxsaW9vp7m5me7ubpqamujs7Mw6ko1xpY7pLgeWAZ9Nnj8EfB8Y9h0s6VXASgpXaXxJ0tnAjIh4XtIKoBVokLQO+HxEdEp6L/CvwETgJkmrIuIdJb8qM7Pd0IoVK7KOYLaDUouEhoj4gaTFABHRJ2nrUCtExJSip02DtGkfZP51wHUlZjMzM7MKKPUrkL2SDgACQNKbgOcqlsrMzMwyV+pIwt9S+FbDayX9ksJhgJMrlsrMzMwyN9xdICdHxOMRcZ+kt1K44ZOA7ogYcHEkMzMzGz2GO9xwfdH09yPigYhY4wLBzMxs9BuuSFDRdM1eH8HMzMxG3nBFQgwybWZmZqPccCcuHibpeQojCuOTaZLnERH7VDSdmZmZZWbIIiEi6qoVxMzMzGpLqddJMDMzszHGRYKZmZmlcpFgZlYj5s+fT2NjIzNnzsw6ihmQUZEgaaGktZKulNQqaZWkByT9LFn+umTetsfzyY2hzMxGrY6ODrq6urKOYbZdqZdlHmlnAMcDPcAdQFtEPC6pESAiuoHDASTVAesp4YZPm7dsZcp5N1Uq805bNKuPDucqWa3mWt5Wn3UEG+VaWlrI5/NZxzDbrupFgqTLKFyY6RbgKuDaiHgcICKeSlnlOOA3EfFY9VKamZmZIqp/jSRJeWA28DlgHHAI8Argkoj4dr+2VwD3RcSlg2xrAbAAoKFh4pFLLv5WBZPvnEnj4cnNWacYyLnKM3XfOiZMmJB1jAF6enqcqwy1nmvDhg0sXryYZcuWVa3vOXPm3BsRs6vWoe02sjrcUNz/kRRGC8YDd0q6KyIeApD0MuA9wOLBNhARS4GlAJOnTY+LVmf9kgZaNKsP5ypdreZa3lZPa2tr1jEGyOVyzlWGWs+Vz+epr6/N95qNPVn/Jl4HbIqIXqBX0u3AYcBDyfITKIwiPFnKxsaPq6P7wrmVSboLcrkc+XmtWccYwLnKk8vlso5gZlZVWX8F8gbgGEl7StobOBpYW7S8HViRSTIzsyprb2+nubmZ7u5umpqa6OzszDqSjXGZjiRExFpJXcD9wEvA5RGxBkBSPfA24JMZRjQzq5oVK/yZyGpLJkVCREwpmv4q8NWUNr3AAVWMZWZmZkWyPtxgZmZmNcpFgpmZmaVykWBmZmapXCSYmZlZKhcJZmZmlspFgpmZmaVykWBmZmapXCSYmZlZKhcJZmZmlspFgplZjZg/fz6NjY3MnDkz6yhmQIWLBEkLJa2VdI2kOyW9IOmcfm3aJHVLeljSeUXzr0zmr5F0haRxlcxqZpa1jo4Ourq6so5htl2lRxLOoHCTpk8BC4F/KV4oqQ74BoVbQs8A2iXNSBZfCbwemAWMB06rcFYzs0y1tLSw//77Zx3DbLuK3eBJ0mXANOAW4IqI+Lqkuf2aHQU8HBGPJOtcBZwIPBgRNxdt67+BpuH63LxlK1POu2mkXsKIWTSrjw7nKlmt5lreVp91BDOzqqpYkRARp0tqA+ZExMZBmr0GeKLo+Trg6OIGyWGGU4Gz0jYgaQGwAKChYSJLZvXtavQRN2l84Q9frXGu8vT09JDL5bKOMYBzlafWc23YsIHe3t6azGhjTya3ii7TN4HbI+LnaQsjYimwFGDytOlx0erae0mLZvXhXKWr1VzL2+ppbW3NOsYAuVzOucpQ67ny+Tz19bX5XrOxJ+vfxOuBA4ueNyXzAJD0eWAi8MlSNjZ+XB3dF/Y/opG9XC5Hfl5r1jEGcK7y+JOdmY01WX8F8h7gYElTJb0MOAW4EUDSacA7gPaIeCnDjGZmVdHe3k5zczPd3d00NTXR2dmZdSQb46oykiDpVcBKYB/gJUlnAzMi4nlJnwZ+BNRROMHxgWS1y4DHgDslAVwbEV+oRl4zsyysWLEi6whmO6hokRARU4qepn47IfkWw80p87M+FGJmZjamZX24wczMzGqUiwQzMzNL5SLBzMzMUrlIMDMzs1QuEszMzCyViwQzMzNL5SLBzMzMUrlIMDMzs1QuEszMzCyViwQzMzNL5SLBzKxGzJ8/n8bGRmbOnJl1FDMgoyJB0kJJayVdKalV0ipJD0j6WVGbNkndkh6WdF4WOc3Mqqmjo4Ourq6sY5htl9VNlM4Ajgd6gDuAtoh4XFIjgKQ64BvA24B1wD2SboyIB4fa6OYtW5ly3k2VTb4TFs3qo8O5SlaruZa31WcdwUa5lpYW8vl81jHMtqv6SIKky4BpwC3AmRRuAf04QEQ8lTQ7Cng4Ih6JiBeBq4ATq53VzMxsLFNEVL9TKQ/MBj4HjAMOAV4BXBIR35Z0MoXRhdOS9qcCR0fEp1O2tQBYANDQMPHIJRd/qzovogyTxsOTm7NOMZBzlWfqvnVMmDAh6xgD9PT0OFcZaj3Xhg0bWLx4McuWLata33PmzLk3ImZXrUPbbWR1uKG4/yOB44DxwJ2S7ipnAxGxFFgKMHna9LhoddYvaaBFs/pwrtLVaq7lbfW0trZmHWOAXC7nXGWo9Vz5fJ76+tp8r9nYk/Vv4nXApojoBXol3Q4clsw/sKhdE7B+uI2NH1dH94VzKxJ0V+RyOfLzWrOOMYBzlSeXy2UdwcysqrL+CuQNwDGS9pS0N3A0sBa4BzhY0lRJLwNOAW7MMKeZWcW1t7fT3NxMd3c3TU1NdHZ2Zh3JxrhMRxIiYq2kLuB+4CXg8ohYAyDp08CPgDrgioh4ILukZmaVt2LFiqwjmO0gkyIhIqYUTX8V+GpKm5uBm6sYy8zMzIpkfbjBzMzMapSLBDMzM0vlIsHMzMxSuUgwMzOzVC4SzMzMLJWLBDMzM0vlIsHMzMxSuUgwMzOzVC4SzMzMLJWLBDMzM0vlIsHMrEbMnz+fxsZGZs6cmXUUMyCjIkHSQklrJa2X9JykVcljSVGb/SRdLenXSdvmLLKamVVLR0cHXV1dWccw2y6ru0CeARwPTAfOiYh3pbS5BOiKiJOT20XvPdxGN2/ZypTzbhrZpCNg0aw+OpyrZLWaa3lbfdYRbJRraWkhn89nHcNsu6qPJEi6DJgG3AK8cZA2+wItQCdARLwYEc9WK6OZmZmBIqL6nUp5YDYwE7gGWAf8lsKowgOSDgeWAg8ChwH3AmdFRG/KthYACwAaGiYeueTib1XjJZRl0nh4cnPWKQZyrvJM3beOCRMmZB1jgJ6eHucqQ63n2rBhA4sXL2bZsmVV63vOnDn3RsTsqnVou42sDjdscx9wUET0SHoncD1wMIVcRwCfiYi7JV0CnAf8Y/8NRMRSCgUFk6dNj4tWZ/2SBlo0qw/nKl2t5lreVk9ra2vWMQbI5XLOVYZaz5XP56mvr833mo09mf4mjojni6ZvlvRNSQ0URhbWRcTdyeKrKRQJQxo/ro7uC+dWJuwuyOVy5Oe1Zh1jAOcqTy6XyzqCmVlVZfoVSEmvkqRk+qgkz6aI2AA8Iel1SdPjKBx6MDMbtdrb22lubqa7u5umpiY6OzuzjmRjXNZjuicDn5LUB2wGTok/nyTxGeDK5JsNjwAfyyijmVlVrFixIusIZjvIpEiIiCnJ5KXJI63NKgonN5qZmVkGfMVFMzMzS+UiwczMzFK5SDAzM7NULhLMzMwslYsEMzMzS+UiwczMzFK5SDAzM7NULhLMzMwslYsEMzMzS+UiwczMzFK5SDAzM7NULhLMzMwslYsEMzMzS+UiwczMzFIpIrLOMGIk/QHozjpHigZgY9YhUjhXeZyrPM5VnixzHRQREzPq22rYnlkHGGHdETE76xD9SVrpXKVzrvI4V3mcy6x0PtxgZmZmqVwkmJmZWarRViQszTrAIJyrPM5VHucqj3OZlWhUnbhoZmZmI2e0jSSYmZnZCHGRYGZmZql2myJBUpukbkkPSzovZflekr6fLL9b0pSiZYuT+d2S3lHFTH8r6UFJ90v6saSDipZtlbQqedw4UpnKyNYh6fdFGU4rWvZRSf+bPD5a5VxfL8r0kKRni5ZVZJ9JukLSU5LWDLJckv5vkvl+SUcULavkvhou17wkz2pJd0g6rGhZPpm/StLKKudqlfRc0c9qSdGyIX/+Fc51blGmNcn7af9kWSX314GSfpr8LnhA0lkpbTJ5j5kNKyJq/gHUAb8BpgEvA34FzOjX5gzgsmT6FOD7yfSMpP1ewNRkO3VVyjQH2DuZ/tS2TMnznoz3Vwdwacq6+wOPJP++Mpl+ZbVy9Wv/GeCKSu8zoAU4AlgzyPJ3ArcAAt4E3F3pfVVirr/e1h9wwrZcyfM80JDR/moFfrirP/+RztWv7buBn1Rpf70aOCKZfgXwUMr/x0zeY374MdxjdxlJOAp4OCIeiYgXgauAE/u1ORH492T6auA4SUrmXxURL0TEo8DDyfYqnikifhoRf0ye3gU0jUC/I5JtCO8AbouIpyPiGeA2oC2jXO3AihHqe1ARcTvw9BBNTgS+HQV3AftJejWV3VfD5oqIO5J+oYrvrxL212B25X050rmq8t4CiIjfRcR9yfQfgLXAa/o1y+Q9Zjac3aVIeA3wRNHzdQz8T7a9TUT0Ac8BB5S4bqUyFfs4hU8K27xc0kpJd0k6aQTy7Ey29ydDm1dLOrDMdSuZi+TQzFTgJ0WzK7nPhjJY7kruq3L1f38FcKukeyUtyCBPs6RfSbpF0iHJvJrYX5L2pvCH9pqi2VXZXyocBn0jcHe/RbvDe8zGoNF2WeaaJOnDwGzgrUWzD4qI9ZKmAT+RtDoiflPFWP8JrIiIFyR9ksIozLFV7H84pwBXR8TWonlZ77OaJGkOhSLhmKLZxyT7qhG4TdKvk0/a1XAfhZ9Vj6R3AtcDB1ep71K8G/hlRBSPOlR8f0maQKEwOTsinh/JbZtVyu4ykrAeOLDoeVMyL7WNpD2BfYFNJa5bqUxIOh74LPCeiHhh2/yIWJ/8+wiQo/DpYqQMmy0iNhXluRw4stR1K5mryCn0Gw6u8D4bymC5K7mvSiLpUAo/vxMjYtO2+UX76ingOkbmEFtJIuL5iOhJpm8GxklqoAb2V2Ko91ZF9pekcRQKhCsj4tqUJjX7HrMxLuuTIkp5UBjxeITC8PO2E54O6dfmTHY8cfEHyfQh7Hji4iOMzImLpWR6I4UTtQ7uN/+VwF7JdAPwv4zsCVylZHt10fR7gbuS6f2BR5OMr0ym969WrqTd6ymcSKYq7rMpDH4i3lx2PKnsvyu9r0rMNZnCOTZ/3W9+PfCKouk7gLYq5nrVtp8dhT+2jyf7rqSff6VyJcv3pXDeQn219lfy2r8NXDxEm8zeY374MdRjtzjcEBF9kj4N/IjCGdJXRMQDkr4ArIyIG4FO4DuSHqbwS+CUZN0HJP0AeBDoA86MHYewK5npq8AE4D8K51DyeES8B3gD8G+SXqIwmnNhRDy4q5nKzLZQ0nso7JOnKXzbgYh4WtIXgXuSzX0hdhyWrXQuKPzsroqI4suBVmyfSVpB4Yz8BknrgM8D45LMlwE3Uzj7/GHgj8DHkmUV21cl5lpC4bybbybvr74o3EVwEnBdMm9P4HsR0VXFXCcDn5LUB2wGTkl+lqk//yrmgkJBfGtE9BatWtH9BbwZOBVYLWlVMu8fKBR5mb7HzIbjyzKbmZlZqt3lnAQzMzOrMhcJZmZmlspFgpmZmaVykWBmZmapXCSYmZlZqt3iK5BmWZO0FVhdNOukiMhnFMfMrCr8FUizEkjqiYgJVexvzyjcg8TMLDM+3GA2AiS9WtLtklZJWiPpLcn8Nkn3JTc7+nEyb39J1yc317orubQyks6X9B1Jv6RwYbCJkq6RdE/yeHOGL9HMxiAfbjArzfiiq+U9GhHv7bf8b4AfRcQFkuqAvSVNBL4FtETEo5L2T9r+E/A/EXGSpGMpXLL38GTZDAo3G9os6XvA1yPiF5ImU7hS4Rsq9grNzPpxkWBWms0RcfgQy+8Brkhu5HN9RKyS1ArcHhGPQuESu0nbY4D3J/N+IukASfsky26MiM3J9PHAjORywQD7SJoQyc2TzMwqzUWC2QiIiNsltVC4Uc9ySV8DntmJTRXfU2AP4E0R8aeRyGhmVi6fk2A2AiQdBDwZEd+icOvmI4C7gBZJU5M22w43/ByYl8xrBTZGxPMpm70V+ExRH4dXKL6ZWSqPJJiNjFbgXElbgB7gIxHxe0kLgGsl7QE8BbwNOJ/CoYn7Kdzx76ODbHMh8I2k3Z7A7cDpFX0VZmZF/BVIMzMzS+XDDWZmZpbKRYKZmZmlcpFgZmZmqVwkmJmZWSoXCWZmZpbKRYKZmZmlcpFgZmZmqf4/zUie7OqFf1gAAAAASUVORK5CYII=\n"
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "from xgboost import plot_importance\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "plot_importance(model, max_num_features=10)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 与sklearn组合使用\n",
    "\n",
    "使用sklearn中自带的数据集，如手写数字（10分类问题）、鸢尾花（3分类问题）\n",
    "\n",
    "sklearn中的KFold、train_test_split等都可以组合使用\n",
    "\n",
    "下面xgboost的使用，也主要使用sklearn风格的API，例如xxClassifier，xxRegressor、fit、predict等"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR'])\n",
      "(1797, 64)\n",
      "(1797,)\n"
     ]
    }
   ],
   "source": [
    "import pickle\n",
    "import xgboost as xgb\n",
    "import numpy as np\n",
    "\n",
    "from sklearn.model_selection import KFold, train_test_split, GridSearchCV\n",
    "from sklearn.metrics import confusion_matrix, mean_squared_error\n",
    "from sklearn.datasets import load_iris, load_digits, load_boston\n",
    "\n",
    "# 10分类问题\n",
    "# 用XGBoost建模，用sklearn做评估，这里使用混淆矩阵进行评估\n",
    "# 加载数据(手写数字)\n",
    "digits = load_digits()\n",
    "print(digits.keys())\n",
    "y = digits['target']\n",
    "X = digits['data']\n",
    "print(X.shape) # (1797, 64)\n",
    "print(y.shape)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 78   0   0   0   0   0   0   0   1   0]\n",
      " [  0  90   0   0   0   1   0   0   0   2]\n",
      " [  0   1  82   0   0   0   3   0   0   0]\n",
      " [  0   0   1  89   0   0   0   1   0   3]\n",
      " [  2   0   0   0 101   0   1   2   1   0]\n",
      " [  0   0   0   1   0  96   2   0   0   3]\n",
      " [  0   3   0   0   1   0  83   0   1   0]\n",
      " [  0   0   0   0   1   0   0  86   0   1]\n",
      " [  0   6   1   3   0   1   0   0  71   0]\n",
      " [  0   0   0   0   1   1   0   5   2  71]]\n",
      "\n",
      "[[97  0  0  0  0  1  0  1  0  0]\n",
      " [ 0 86  0  1  0  0  1  0  0  1]\n",
      " [ 0  0 90  1  0  0  0  0  0  0]\n",
      " [ 0  1  0 86  0  1  0  0  1  0]\n",
      " [ 0  1  0  0 72  0  0  0  0  1]\n",
      " [ 1  0  0  0  0 72  0  0  2  5]\n",
      " [ 1  0  0  0  0  1 91  0  0  0]\n",
      " [ 0  0  0  0  0  0  0 90  1  0]\n",
      " [ 1  2  0  0  0  1  0  2 85  1]\n",
      " [ 0  6  0  1  0  1  0  0  1 91]]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# K折切分器\n",
    "kf = KFold(n_splits=2, shuffle=True, random_state=1234) # 2折\n",
    "for train_index, test_index in kf.split(X):\n",
    "    # 这里model没做任何param的设定，全使用默认值\n",
    "    xgb_model = xgb.XGBClassifier().fit(X[train_index], y[train_index])\n",
    "    pred = xgb_model.predict(X[test_index])\n",
    "    ground_truth = y[test_index]\n",
    "    print(confusion_matrix(ground_truth, pred))\n",
    "    print()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[25  0  0]\n",
      " [ 0 24  1]\n",
      " [ 0  0 25]]\n",
      "\n",
      "[[25  0  0]\n",
      " [ 0 23  2]\n",
      " [ 0  1 24]]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 3分类问题（鸢尾花）\n",
    "iris = load_iris()\n",
    "y_iris = iris['target']\n",
    "X_irsi = iris['data']\n",
    "\n",
    "kf = KFold(n_splits=2, shuffle=True, random_state=1234) # 2折\n",
    "for train_index, test_index in kf.split(X_irsi):\n",
    "    xgb_model = xgb.XGBClassifier().fit(X_irsi[train_index], y_iris[train_index])\n",
    "    pred = xgb_model.predict(X_irsi[test_index])\n",
    "    ground_truth = y_iris[test_index]\n",
    "    print(confusion_matrix(ground_truth, pred))\n",
    "    print()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mse: 11.431804163616869\n",
      "\n",
      "mse: 15.365480950058584\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 回归问题（boston房价预测）\n",
    "boston = load_boston()\n",
    "# print(type(boston))\n",
    "X_boston = boston['data']\n",
    "y_boston = boston['target']\n",
    "\n",
    "kf = KFold(n_splits=2, shuffle=True, random_state=1234) # 2折\n",
    "for train_index, test_index in kf.split(X_boston):\n",
    "    # 这里使用回归器：XGBRegressor\n",
    "    xgb_model = xgb.XGBRegressor().fit(X_boston[train_index], y_boston[train_index])\n",
    "    pred = xgb_model.predict(X_boston[test_index])\n",
    "    ground_truth = y_boston[test_index]\n",
    "    # 回归问题，所以评估换成mse\n",
    "    print('mse:', mean_squared_error(ground_truth, pred))\n",
    "    print()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 优化超参数-网格搜索"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "source": [
    "# 回归问题（boston房价预测）\n",
    "boston = load_boston()\n",
    "X_boston = boston['data']\n",
    "y_boston = boston['target']\n",
    "xgb_model = xgb.XGBRegressor()\n",
    "\n",
    "# 参数字典\n",
    "param_dict = {'max_depth': [2,4,6], # 最大树深\n",
    "              'n_estimators': [50, 100, 200]} # 树的棵树\n",
    "\n",
    "rgs = GridSearchCV(xgb_model, param_dict)\n",
    "rgs.fit(X_boston, y_boston)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "execution_count": 40,
   "outputs": [
    {
     "data": {
      "text/plain": "GridSearchCV(estimator=XGBRegressor(base_score=None, booster=None,\n                                    colsample_bylevel=None,\n                                    colsample_bynode=None,\n                                    colsample_bytree=None, gamma=None,\n                                    gpu_id=None, importance_type='gain',\n                                    interaction_constraints=None,\n                                    learning_rate=None, max_delta_step=None,\n                                    max_depth=None, min_child_weight=None,\n                                    missing=nan, monotone_constraints=None,\n                                    n_estimators=100, n_jobs=None,\n                                    num_parallel_tree=None, random_state=None,\n                                    reg_alpha=None, reg_lambda=None,\n                                    scale_pos_weight=None, subsample=None,\n                                    tree_method=None, validate_parameters=None,\n                                    verbosity=None),\n             param_grid={'max_depth': [2, 4, 6],\n                         'n_estimators': [50, 100, 200]})"
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'max_depth': 2, 'n_estimators': 100}\n"
     ]
    }
   ],
   "source": [
    "print(rgs.best_params_)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}